Dealing with Numeric Fields in Termination Analysis of Java-like Languages
نویسندگان
چکیده
Termination analysis tools strive to find proofs of termination for as wide a class of (terminating) programs as possible. Though several tools exist which are able to prove termination of non-trivial programs, when one tries to apply them to realistic programs, there are still a number of open problems. In the case of Java-like languages, one of such problems is to find a practical solution to prove termination when the termination behaviour of loops is affected by numeric fields. We have performed statistics on the Java libraries to see how often this happens in practice and we found that in 12.95% of cases, the number of iterations of loops (and therefore termination) explicitly depends on values stored in fields and, in the vast majority of cases, such fields are numeric. Inspired by the examples found in the libraries, this paper identifies a series of difficulties that need to be solved in order to deal with numeric fields in termination and propose some ideas towards a lightweight analysis which is able to prove termination of sequential Java-like programs in the presence of numeric fields. 1 Termination Analysis and Numeric Fields Termination analysis tools strive to find proofs of termination for as wide a class of (terminating) programs as possible. Termination analysis is about the study of loops, which are the program constructs which may introduce non-termination. Loops may correspond to iterative constructs or to recursion. The boolean conditions which determine whether the loop should be executed again or not are called guards. Automated techniques for proving termination are typically based on analyses which track size information, such as the value of numeric data or array indexes, or the size of data structures. In particular, analysis should keep track of how the (size of the) data involved in loop guards changes when the loop goes through its iterations. This information is used for specifying a ranking function for the loop [14], which is a function which strictly decreases on a ⋆ This work was funded in part by the Information Society Technologies program of the European Commission, Future and Emerging Technologies under the IST15905 MOBIUS project, by the Spanish Ministry of Education (MEC) under the TIN-2005-09207 MERIT project, and the Madrid Regional Government under the S-0505/TIC/0407 PROMESAS project. S. Genaim was supported by a Juan de la Cierva Fellowship awarded by MEC. well-founded domain at each iteration of the loop, thus guaranteeing that the loop will be executed a finite number of times. In the last two decades, a variety of sophisticated termination analysis tools have been developed. Several analyses and tools exist, primarily for less-widely used programming languages, including term rewrite systems [8], and logic and functional languages [11, 6, 10]. Termination-proving techniques are also emerging in the imperative paradigm [5, 7, 8], even for dealing with large industrial code [7]. Termination analysis of realistic object-oriented programming languages faces new difficulties due to the existence of advanced features such as exceptions, virtual method invocation, references, heap-allocated data-structures, objects, fields. Focusing on Java, termination analyzers for Java bytecode programs [1] and for Java source [9] are being developed which are able to accurately handle a good number of the features mentioned above. However, interesting open problems still remain. In particular, it is well known that the heap poses important difficulties to static analysis. Some reasons for this are that the heap is a global data structure whose contents are not accessed using named variables, but rather using (possibly chained) references. Therefore, the same location in the heap may be modified using different aliased references and, furthermore, references may be reassigned several times, and thus they may point to different locations during execution. When loop guards involve information stored in the heap, such as object fields, tracking size information becomes rather complex and accurate aliasing information is required in order to track all possible updates of the corresponding fields (see e.g. [12]). A partial solution to this problem is already solved by the path-length domain [9] which allows proving termination of loops which traverse acyclic heapallocated data structures (i.e., linked lists, trees, etc.). Path-length is an abstract domain which, for reference values, provides a safe approximation of the length of the longest reference chain reachable from it. Unfortunately, though the pathlength domain is a useful abstraction for fields which contain references, it does not capture any information about fields which contain numbers. In this work we look into the Sun implementation of the Java libraries for J2SE 1.4.2 in order to estimate how often loop termination depends on numeric values stored in fields and to try to come up with sufficient conditions for termination which are able to cover a large fraction of those loops whose termination is not provable using current techniques, such as those in [9, 1]. 2 Motivating Examples from the Java Libraries Since termination is an undecidable problem, all techniques for proving termination provide sufficient (but not necessary) conditions for termination. Therefore, for any termination proving technique it is possible to find terminating programs where the given technique fails to prove termination. Thus, usually the practicality of termination analyses is measured by applying the analyses to a representative set of real programs. In this work, the design of the analysis is driven by common programming patterns for loops that we have found in the Java libraries. By looking at Sun’s implementation of the J2SE (version 1.4.2 13) libraries, which contain 71432 methods, we have found 7886 loops (for, while, and do) from which 1021 (12.95%) explicitly involve fields in their guards. By inspecting these 1021 loops, we have observed, among others, the following three kinds of common patterns in the Java libraries. Pattern #1: Loops in this category use numeric fields as bounds for loop counters and, moreover, the value of those fields is not updated within the loop. This is demonstrated in the following loop of the method public void or(BitSet set) of library java.util.BitSet, where unitsInUse is a field of type int: for(; i<set.unitsInUse; i++) bits[i]=set.bits[i]; Pattern #2: Loops in this category are similar to those in the previous category. The difference is that, rather than corresponding to the value of a numeric field, the bound of the loop counter corresponds to the length of an array which is stored in a field. In this case, even if the elements of the array may be updated within the loop, if the field itself does not, the length of the array remains constant. This is demonstrated in the following example, corresponding to method public void fixupVariables(java.util.Vector vars, int globalsSize) of library org. apache.xpath.functions.FunctionMultiArgs where m args is a field of type Expression[ ]: for(int i=0; i<m args.length; i++) m args[i].fixupVariables(vars,globalsSize); Pattern #3: Loops in this category use numeric fields as loop counters, which means that the field value is updated within the loop, but none of the references in the path to the field (in this example, the chain just consists of the reference this) are re-assigned within the loop, i.e., all updates correspond to the same object on the heap. This is demonstrated in the following loop of the method public synchronized void setLength(int newLength) in the library java.lang.StringBuffer, in which count is a field of type int: for(; count<newLength; count++) value[count] = ’\0’; In this paper we concentrate on proving termination of loops that fall in the above categories by providing (uniform) conditions under which proving termination of such loops becomes possible. The Java libraries include also other patterns such as loops that: (1) increase/decrease an integer variable until it reaches a given upper/lower bound; (2) traverse a non-cyclical data structure or an array; (3) look for an element in an input stream, which is common in classes that manipulate structured text such as parsing XML documents; and (4) look for a non-null element in a given array in a circular way, which is very common in the multi-threading classes. The first two patterns are the major part of the loops, and they are already handled in [1]. The other patterns are planned for future research and are not addressed in this paper. 3 Dealing with Fields in Termination In a Java-like language, objects are stored in the heap and they are accessed by means of references (or pointers). References can take the value null or point to an object in the heap. Given a reference l which points to an object o, l.f denotes the value of the field f in the object o. We say that a syntactic construction of the form l.f is a field access. Each field f has a unique signature, which consists of the class where it is declared, its type, and its name. Objects are global in that they survive the execution of methods. Typically, when a method starts execution, a large number of objects may exist in the heap. One approach to analyzing programs with objects is to compute an abstraction of the heap (see [13]) which approximates the execution context of each method. This usually requires computing abstractions of all possible objects in the program, which might turn out to be too expensive in practice if one wants to deal with real programs. However, in most cases, only a small fraction of such objects affects the execution of the method. We seek for a more lightweight approach which tries to approximate the contents of only a subset of the objects in the heap. The approach must remain correct by making safe assumptions about the objects (and fields) whose contents are not taken into consideration. Another disadvantage of computing an abstraction of the heap, in addition to its computational complexity, is that we end up obtaining termination information which is context-dependent. Though context dependent analysis is in principle more precise, the results obtained are not extrapolable to other execution contexts. In particular, in the case of libraries, ideally we would like to prove termination in a context-independent way, i.e., regardless of what the contents of the heap are when the method is executed. We now introduce the concept of local field access. In particular, we are interested in finding field accesses which are local to a loop. Though termination analysis in our context aims at proving termination of methods, in the rest of the paper we will concentrate on loops since they are the main subject of termination analysis. Definition 1 (local field access). We say that a field access l.r1. . . . .rn.f , where f is a numeric field, is local to a loop L if (i) No prefix of l.r1. . . . .rn changes its value within L, i.e., they remain constant. (ii) If the value of l.r1. . . . .rn.f changes within L, then all write accesses have to be done explicitly through the field access l.r1. . . . .rn.f . Condition (i) guarantees that all occurrences of the field access within the loop refer to the same memory location in the heap. Note that the prefixes of l.r1. . . . .rn, i.e., l, l.r1, l.r1.r2, . . . are references which altogether form a chain to an object where the numeric field f is stored. Condition (ii) guarantees that all write accesses to the field can be syntactically identified. Note that this condition can be violated due to aliasing, since we can have different field access which update the same memory location. Given a loop L, we denote by g-fields(L) the set of field accesses l.r1 . . . rn.f , where f is a numeric field, which explicitly appear inside the guard of L. For instance, for the three loops in Section 2, the sets g-fields(L) are, respectively, {this.unitsInUse}, {m args.length} and {this.count}. These three fields are locally accessed within their corresponding loops. The practical implication is: if we ensure that a field in g-fields(L) is local, then we are able to treat this field in the same way as if it were a local variable, as regards the analysis of L. Essentially, given a loop L, the analysis proceeds as follows: 1. Compute the set g-fields(L). 2. Compute the set l-g-fields(L), which is the subset of g-fields(L) which contains the field accesses whose locality condition has been proved. 3. Analyze the termination of L by considering those field accesses in l-gfields(L) as if they were local variables. The method is applied locally to all nested loops in L. Note that the termination of a method is ensured if all loops involved in its body are terminating. By involved we mean not only those loops occurring explicitly in the body but also those coming from possible calls to some other methods. 3.1 Syntactic Inference of the Locality Condition on Field Accesses The above approach is practical only if we provide effective mechanisms to prove the locality condition on field accesses. In this section, we consider only loops that do not contain method invocations. Later, in Section 4, we take method invocations into account. Now, we present sufficient syntactic conditions for ensuring that a field access is local. The following conditions ensure that a numeric field access l.r1. . . . .rn.f is local to a loop L: 1. The reference variable l remains constant in L. This can be ensured by checking that there is no assignment to l within L. 2. All reference fields l.r1, . . . , l.r1...rn are constant in L. This can be ensured by checking that there is no assignment within L to a field with the same signature as any of ri. 3. All assignments to a field with the same signature as f in L are done through the field access l.r1. . . . .rn.f . Let us briefly explain each of the above conditions. Conditions 1 and 2 ensure point (i) of Definition 1. The reason why we separate it into two conditions is due to the way in which it is syntactically checked in each case. For the reference variable l, we check that there is no assignment to it. These conditions guarantee that we do not incorrectly consider a loop of the form while (l.size < 10) {l.size++; l=new C(); } as terminating. Note that this loop is not guaranteed to terminate since l potentially changes the location of size and hence its value. Condition 2 guarantees that we do not change any of the intermediary reference fields l.r1, . . . , l.r1...rn. Note that if we modify a reference field l.r1...ri then we fail to ensure constancy of the local field access. For instance, we would fail to prove termination of this loop while (l.r1.size < 10) {l.r1.size++; l’.r1=z; }. This is a safe assumption, as without knowledge about the aliasing of l and l, we might be changing the reference to size. Condition 3 is a sufficient condition to ensure that the field is not updated due to possible aliasing with another object (point (ii) in Definition 1). This condition is not satisfied in a loop of the form while (l.size < 10) {l.size++; l’.size--; } and therefore we do not prove termination for it. This is reasonable, as l and l might be aliased during the execution. Example 1. Reconsider the third loop in Section 2. For clarity, we replace the access to the field count to explicitly include the this path variable: for(; this.count<newLength; this.count++) value[this.count] = ’\0’; We can prove that this.count is local to the loop by checking the syntactic conditions stated above: the reference this does not change; and all updates to this.count are done through the field access this.count. The key point is that, since this.count is local, we can safely treat it as local variable. Consequently, existing termination analysers [3] are able to infer that this.count is increasing at each iteration. Besides, as newLength remains constant in the loop, the analyzer finds out that newLength-this.count is a decreasing well-founded measure and thus termination is guaranteed. 2 4 Termination with (Virtual) Method Invocations In this section, we address the more challenging problem of proving the termination of loops which contain method invocations. As notation, we denote by M(L) the set of methods transitively invoked within the scope of a loop L. We now study what are the conditions that the methods in M(L) must satisfy in order to preserve the locality condition on g-fields(L). Consider a method m invoked within L, we distinguish three possible scenarios. In the first two ones, the implementation of m is available at analysis time and thus we can apply the techniques to detect local field accesses to the code in m. As our method is purely syntactic, in order to check the conditions on m, first we must do a renaming between the variables in the call and the formal parameters in m, as parameter passing does. Note that, when a method m is invoked from a reference l, the this reference in m is renamed to l in order to check the conditions. In the first scenario, method m does not modify the value of the (numeric) field, whereas in the second one it does. In the third one, the implementation of m either it is not available (i.e., it is an abstract or native method) or it has been redefined by means of subclassing. We aim at proving modular termination of the loop by making assumptions on m. We study these scenarios in more detail below. Scenario 1. Consider method test1 at the top of the right-hand column in Fig. 1. Due to dynamic dispatching, the execution of a.m1() can correspond to method
منابع مشابه
Automated Termination Analysis of Java Bytecode by Term Rewriting Carsten Otto and Marc Brockschmidt and Christian Von Essen and Jürgen Giesl
We present an automated approach to prove termination of Java Bytecode (JBC) programs by automatically transforming them to term rewrite systems (TRSs). In this way, the numerous techniques and tools developed for TRS termination can now be used for imperative object-oriented languages like Java, which can be compiled into JBC.
متن کاملAutomated Termination Analysis of Java Bytecode by Term Rewriting
We present an automated approach to prove termination of Java Bytecode (JBC) programs by automatically transforming them to term rewrite systems (TRSs). In this way, the numerous techniques and tools developed for TRS termination can now be used for imperative object-oriented languages like Java, which can be compiled into JBC.
متن کاملTermination and Cost Analysis with COSTA and its User Interfaces
costa is a static analyzer for Java bytecode which is able to infer cost and termination information for large classes of programs. The analyzer takes as input a program and a resource of interest, in the form of a cost model, and aims at obtaining an upper bound on the execution cost with respect to the resource and at proving program termination. The costa system has reached a considerable de...
متن کاملJava program analysis by symbolic execution
Program analysis has a long history in computer science. Even when only considering the important aspect of termination analysis, in the past decades an overwhelming number of different techniques has been developed. While the programming languages considered by these approaches initially were more of theoretical importance than of practical use, recently also automated analyses for imperative ...
متن کاملFrom Object Fields to Local Variables: A Practical Approach to Field-Sensitive Analysis
Static analysis which takes into account the value of data stored in the heap is typically considered complex and computationally intractable in practice. Thus, most static analyzers do not keep track of object fields (or fields for short), i.e., they are field-insensitive. In this paper, we propose locality conditions for soundly converting fields into local variables. This way, field-insensit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008